# Legislator Information and Text-based Ideal Point Values: Result Files, Replication Code, and Data for Creating the Main File used for all Analyses in our Work in: Express Yourself (Ideologically): Legislators' Ideal Points Across Audiences. 

### Code subdirectory structure

- **legislator_info_and_tbip_congresses_115_and_116.csv**: Intermediate data file with legislator information and TBIP values; information for the legislator spans a lot of information for their district, caucus memberships for the legislator, etc -- this is used for the TBIP scores in the final dataset file. 

- **codebook.txt**: Explains the columns/variables present in the above file.

- **supporting_data_files/**: This contains the data files used to derive information for constructing legislator_info_and_tbip_congresses_115_and_116.csv - everything except the TBIP model derived values comes from these files (information about legislators).

- **speeches_results/**: Results from TBIP on floor speeches (topics, topic proportions in various texts including the raw text, mean topic proportions for every legislator).

- **tweets_results/**: Results from TBIP on Twitter tweets (topics, topic proportions in various texts including the raw text, mean topic proportions for every legislator).

- **tbip/**: TBIP code, with commands used to run included, as well as the script for running issue-specific tbip. Includes data files, code to get from raw data to clean data TBIP code can use, and analysis code that creates the result files. Please view the README for instructions. 

- **combine_and_create_main_file_for_conducting_research.ipynb**: Code (Jupyter notebook) that uses result files (ideal point estimates) as well as supporting data files for legislator information and combines them to create the main csv file used to conduct analyses used in our research - legislator_info_and_tbip_congresses_115_and_116.csv. 


### Code requirements

Python and Jupyter Notebook experience is required. 

- Use Python 3.6

- The python libraries required to run the main code for computing TBIP scores are provided in `requirements.txt` in `tbip/` subdirectory. 


### Note: Our code for computing TBIP scores for legislators (using their speeches and tweets) builds off of the original repository (https://github.com/keyonvafa/tbip) by Vafa et al (2020). 

If you use our TBIP scores or produced dataset for your research, or make use of the codebase, in addition to citing our JOP paper, please also cite: 

Vafa, K., Naidu, S., & Blei, D. (2020, July). Text-Based Ideal Points. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5345-5357). 

### Caveats

- The ideal point values generated may vary slightly (won't be an exact replica) if you train and run the models from scratch, but will be similar values, rank all legislators similarly, and reproduce our main findings. 

- GPU is required to run the TBIP model (specified at the appropriate steps along with full documentation for running code at `tbip/README.md`)

- High-memory CPU (with at least 32 GB RAM) is required to run various parts of the code. 

- Competency and experience working with Python and Jupyter Notebook may be required to handle different library versions, setting up different anaconda or virtual environments to run different parts of the code (and setting up jupyter notebook to use the appropriate python environment), etc. 